Overview

Dataset statistics

Number of variables13
Number of observations15326
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory104.0 B

Variable types

Numeric9
Categorical4

Alerts

city is highly correlated with city_development_indexHigh correlation
city_development_index is highly correlated with cityHigh correlation
relevent_experience is highly correlated with last_new_jobHigh correlation
last_new_job is highly correlated with relevent_experienceHigh correlation
df_index is uniformly distributed Uniform
df_index has unique values Unique
major_discipline has 202 (1.3%) zeros Zeros
experience has 445 (2.9%) zeros Zeros
company_size has 1687 (11.0%) zeros Zeros
company_type has 519 (3.4%) zeros Zeros
last_new_job has 6519 (42.5%) zeros Zeros

Reproduction

Analysis started2022-02-20 15:11:22.200691
Analysis finished2022-02-20 15:11:42.159983
Duration19.96 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct15326
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9561.503719
Minimum0
Maximum19157
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile945.25
Q14751.5
median9567.5
Q314354.75
95-th percentile18194.75
Maximum19157
Range19157
Interquartile range (IQR)9603.25

Descriptive statistics

Standard deviation5540.3585
Coefficient of variation (CV)0.5794442655
Kurtosis-1.204649403
Mean9561.503719
Median Absolute Deviation (MAD)4801
Skewness0.001740770794
Sum146539606
Variance30695572.3
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178551
 
< 0.1%
83671
 
< 0.1%
35211
 
< 0.1%
44821
 
< 0.1%
144421
 
< 0.1%
26811
 
< 0.1%
167891
 
< 0.1%
44631
 
< 0.1%
41421
 
< 0.1%
22311
 
< 0.1%
Other values (15316)15316
99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
121
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
191571
< 0.1%
191561
< 0.1%
191541
< 0.1%
191531
< 0.1%
191521
< 0.1%
191511
< 0.1%
191501
< 0.1%
191481
< 0.1%
191471
< 0.1%
191461
< 0.1%

city
Real number (ℝ≥0)

HIGH CORRELATION

Distinct122
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.20161817
Minimum0
Maximum122
Zeros18
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile5
Q15
median48
Q364
95-th percentile104
Maximum122
Range122
Interquartile range (IQR)59

Descriptive statistics

Standard deviation35.51183338
Coefficient of variation (CV)0.8034057316
Kurtosis-1.015955959
Mean44.20161817
Median Absolute Deviation (MAD)35
Skewness0.4051720284
Sum677434
Variance1261.09031
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
53493
22.8%
642128
13.9%
481248
 
8.1%
131062
 
6.9%
49679
 
4.4%
30459
 
3.0%
95347
 
2.3%
103251
 
1.6%
6243
 
1.6%
4243
 
1.6%
Other values (112)5173
33.8%
ValueCountFrequency (%)
018
 
0.1%
169
 
0.5%
2220
 
1.4%
356
 
0.4%
4243
 
1.6%
53493
22.8%
6243
 
1.6%
765
 
0.4%
86
 
< 0.1%
95
 
< 0.1%
ValueCountFrequency (%)
12279
0.5%
12170
0.5%
12088
0.6%
11922
 
0.1%
11819
 
0.1%
11736
 
0.2%
116153
1.0%
11517
 
0.1%
11456
 
0.4%
11320
 
0.1%

city_development_index
Real number (ℝ≥0)

HIGH CORRELATION

Distinct93
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.10622472
Minimum0
Maximum92
Zeros15
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile14
Q137
median82
Q385
95-th percentile90
Maximum92
Range92
Interquartile range (IQR)48

Descriptive statistics

Standard deviation29.14003885
Coefficient of variation (CV)0.4617617197
Kurtosis-0.9975403599
Mean63.10622472
Median Absolute Deviation (MAD)8
Skewness-0.825825199
Sum967166
Variance849.141864
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
854172
27.2%
142128
13.9%
821248
 
8.1%
901062
 
6.9%
27546
 
3.6%
78459
 
3.0%
91411
 
2.7%
67347
 
2.3%
88243
 
1.6%
57243
 
1.6%
Other values (83)4467
29.1%
ValueCountFrequency (%)
015
 
0.1%
124
 
0.2%
25
 
< 0.1%
311
 
0.1%
44
 
< 0.1%
511
 
0.1%
65
 
< 0.1%
775
 
0.5%
8195
1.3%
947
 
0.3%
ValueCountFrequency (%)
9270
 
0.5%
91411
 
2.7%
901062
 
6.9%
89144
 
0.9%
88243
 
1.6%
87109
 
0.7%
869
 
0.1%
854172
27.2%
8479
 
0.5%
83155
 
1.0%

gender
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size898.1 KiB
1.0
13885 
0.0
 
1235
2.0
 
206

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.013885
90.6%
0.01235
 
8.1%
2.0206
 
1.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.013885
90.6%
0.01235
 
8.1%
2.0206
 
1.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

relevent_experience
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size868.2 KiB
0
11038 
1
4288 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
011038
72.0%
14288
 
28.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
011038
72.0%
14288
 
28.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size898.1 KiB
2.0
11264 
0.0
3080 
1.0
 
982

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row0.0

Common Values

ValueCountFrequency (%)
2.011264
73.5%
0.03080
 
20.1%
1.0982
 
6.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2.011264
73.5%
0.03080
 
20.1%
1.0982
 
6.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

education_level
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size898.1 KiB
0.0
9514 
2.0
3535 
1.0
1667 
3.0
 
358
4.0
 
252

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row4.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.09514
62.1%
2.03535
 
23.1%
1.01667
 
10.9%
3.0358
 
2.3%
4.0252
 
1.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0.09514
62.1%
2.03535
 
23.1%
1.01667
 
10.9%
3.0358
 
2.3%
4.0252
 
1.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

major_discipline
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.716886337
Minimum0
Maximum5
Zeros202
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.9522556656
Coefficient of variation (CV)0.2018822582
Kurtosis11.41194581
Mean4.716886337
Median Absolute Deviation (MAD)0
Skewness-3.495353968
Sum72291
Variance0.9067908526
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
513838
90.3%
2529
 
3.5%
4306
 
2.0%
1267
 
1.7%
0202
 
1.3%
3184
 
1.2%
ValueCountFrequency (%)
0202
 
1.3%
1267
 
1.7%
2529
 
3.5%
3184
 
1.2%
4306
 
2.0%
513838
90.3%
ValueCountFrequency (%)
513838
90.3%
4306
 
2.0%
3184
 
1.2%
2529
 
3.5%
1267
 
1.7%
0202
 
1.3%

experience
Real number (ℝ≥0)

ZEROS

Distinct22
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.92659533
Minimum0
Maximum21
Zeros445
Zeros (%)2.9%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile1
Q17
median14
Q319
95-th percentile21
Maximum21
Range21
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.609195333
Coefficient of variation (CV)0.5112866277
Kurtosis-0.9501779516
Mean12.92659533
Median Absolute Deviation (MAD)5
Skewness-0.5118464819
Sum198113
Variance43.68146295
MonotonicityNot monotonic
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
212674
17.4%
151132
 
7.4%
141118
 
7.3%
131097
 
7.2%
16979
 
6.4%
11906
 
5.9%
17845
 
5.5%
1788
 
5.1%
19785
 
5.1%
18618
 
4.0%
Other values (12)4384
28.6%
ValueCountFrequency (%)
0445
2.9%
1788
5.1%
2535
3.5%
3395
2.6%
4314
 
2.0%
5487
3.2%
6546
3.6%
7390
2.5%
8272
 
1.8%
9230
 
1.5%
ValueCountFrequency (%)
212674
17.4%
20409
 
2.7%
19785
 
5.1%
18618
 
4.0%
17845
 
5.5%
16979
 
6.4%
151132
7.4%
141118
7.3%
131097
7.2%
12128
 
0.8%

company_size
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.034712254
Minimum0
Maximum7
Zeros1687
Zeros (%)11.0%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q34
95-th percentile7
Maximum7
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.044106366
Coefficient of variation (CV)0.6735750198
Kurtosis-0.7513707788
Mean3.034712254
Median Absolute Deviation (MAD)2
Skewness0.3067150917
Sum46510
Variance4.178370837
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
43394
22.1%
12910
19.0%
32574
16.8%
01687
11.0%
21634
10.7%
71349
 
8.8%
51077
 
7.0%
6701
 
4.6%
ValueCountFrequency (%)
01687
11.0%
12910
19.0%
21634
10.7%
32574
16.8%
43394
22.1%
51077
 
7.0%
6701
 
4.6%
71349
 
8.8%
ValueCountFrequency (%)
71349
 
8.8%
6701
 
4.6%
51077
 
7.0%
43394
22.1%
32574
16.8%
21634
10.7%
12910
19.0%
01687
11.0%

company_type
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.321479838
Minimum0
Maximum5
Zeros519
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile1
Q14
median5
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.376736782
Coefficient of variation (CV)0.3185799386
Kurtosis2.724603401
Mean4.321479838
Median Absolute Deviation (MAD)0
Skewness-2.020618858
Sum66231
Variance1.895404166
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
511241
73.3%
41833
 
12.0%
1919
 
6.0%
2667
 
4.4%
0519
 
3.4%
3147
 
1.0%
ValueCountFrequency (%)
0519
 
3.4%
1919
 
6.0%
2667
 
4.4%
3147
 
1.0%
41833
 
12.0%
511241
73.3%
ValueCountFrequency (%)
511241
73.3%
41833
 
12.0%
3147
 
1.0%
2667
 
4.4%
1919
 
6.0%
0519
 
3.4%

last_new_job
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.800991779
Minimum0
Maximum5
Zeros6519
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.943962501
Coefficient of variation (CV)1.079384439
Kurtosis-1.404300017
Mean1.800991779
Median Absolute Deviation (MAD)1
Skewness0.5147048644
Sum27602
Variance3.778990207
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
06519
42.5%
42681
17.5%
12382
 
15.5%
52055
 
13.4%
2846
 
5.5%
3843
 
5.5%
ValueCountFrequency (%)
06519
42.5%
12382
 
15.5%
2846
 
5.5%
3843
 
5.5%
42681
17.5%
52055
 
13.4%
ValueCountFrequency (%)
52055
 
13.4%
42681
17.5%
3843
 
5.5%
2846
 
5.5%
12382
 
15.5%
06519
42.5%

training_hours
Real number (ℝ≥0)

Distinct241
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.49504111
Minimum0
Maximum240
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size119.9 KiB

Quantile statistics

Minimum0
5-th percentile6
Q122
median46
Q387
95-th percentile173
Maximum240
Range240
Interquartile range (IQR)65

Descriptive statistics

Standard deviation51.76235137
Coefficient of variation (CV)0.8417321208
Kurtosis0.96195172
Mean61.49504111
Median Absolute Deviation (MAD)29
Skewness1.22567531
Sum942473
Variance2679.341019
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
27266
 
1.7%
17241
 
1.6%
11228
 
1.5%
49224
 
1.5%
21220
 
1.4%
19218
 
1.4%
33217
 
1.4%
20217
 
1.4%
5216
 
1.4%
23215
 
1.4%
Other values (231)13064
85.2%
ValueCountFrequency (%)
04
 
< 0.1%
182
 
0.5%
2101
0.7%
3187
1.2%
485
 
0.6%
5216
1.4%
6161
1.1%
7184
1.2%
8184
1.2%
9193
1.3%
ValueCountFrequency (%)
2407
< 0.1%
23911
0.1%
23810
0.1%
2379
0.1%
23610
0.1%
23510
0.1%
2349
0.1%
23311
0.1%
2327
< 0.1%
2318
0.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcitycity_development_indexgenderrelevent_experienceenrolled_universityeducation_levelmajor_disciplineexperiencecompany_sizecompany_typelast_new_jobtraining_hours
01785564141.002.00.05.01.04.05.00.089
1176645851.012.04.05.015.05.05.05.014
21340485771.002.00.05.03.02.02.04.035
3133665851.002.00.05.015.01.01.00.052
41567095670.000.00.05.015.04.05.00.0154
5385748821.002.02.05.017.03.05.04.049
61711564140.002.02.05.020.02.05.05.038
711516101401.000.02.05.016.03.04.05.013
8945741241.002.00.05.010.00.05.00.010
91282564141.002.02.05.014.00.05.00.0175

Last rows

df_indexcitycity_development_indexgenderrelevent_experienceenrolled_universityeducation_levelmajor_disciplineexperiencecompany_sizecompany_typelast_new_jobtraining_hours
153161287848821.002.00.05.019.01.04.00.057
153171291430781.002.02.05.021.04.05.00.084
15318171115851.001.00.05.021.06.05.04.046
153191157695670.001.00.05.01.03.05.00.0186
153205365104271.002.00.05.01.02.05.03.065
153211039895671.002.00.05.04.04.05.00.092
153228595850.002.02.05.01.04.05.00.015
153231056674750.002.02.05.05.01.05.00.033
15324308564140.002.00.05.06.00.01.01.0110
1532530195851.002.02.05.02.01.05.03.081